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Technical Assessment Report 
1.0 Notification and Authorization 

The NASA Engineering and Safety Center (NESC) set out to utilize data mining and trending 
techniques to review the anomaly history of the International Space Station (ISS) and provide 
tools for discipline experts not involved with the ISS Program to search anomaly data to aid in 
identification of areas that may warrant further investigation. Additionally, the assessment team 
aimed to develop an approach and skillset for integrating data sets, with the intent of providing 
an enriched data set for discipline experts to investigate that is easier to navigate, particularly in 
light of ISS aging and the plan to extend its life into the late 2020s. 

Mr. Robert Beil, NESC Systems Engineering Office (SEO), NASA Kennedy Space Center 
(KSC), was selected to lead this assessment. The key stakeholders for this assessment were 
Mr. Timmy Wilson, Director, NESC, and Mr. Michael Suffredini, Manager, ISS Program Office. 
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4.0 Executive Summary 

The objective of this assessment was to utilize data mining and trending techniques to review the 
anomaly history of the International Space Station (ISS) and provide tools for discipline experts 
not involved with the ISS Program to search anomaly data to aid in identification of areas that 
may warrant further investigation. Previous NASA Engineering and Safety Center (NESC) data 
mining and trending assessments [ref. 1] performed analysis on data contained in individual 
anomaly recordkeeping systems (i.e., databases). However, ISS anomalies and 
nonconformances are documented in multiple databases. The assessment team prepared and 
integrated pertinent ISS nonconformance data from multiple sources and provided an enriched 
data set that was easier to navigate and use. 

The data trending goals were to: 

• Demonstrate the capability to trend ISS anomaly data from multiple data sets. 

• Provide a means for discipline experts to gain deeper insight into ISS anomaly data. 

• Provide fresh insight into ISS problem trends and significant anomalies, as able within 
the assessment timeline. 

• Eearn successful approaches to assist discipline experts in trending across multiple, 
merged data sets. 

The timeframe for the assessment was approximately 1 year to accomplish these goals; however, 
the goals were not fully met. The preparation, integration, mining, and presentation of the ISS 
data took longer than expected, with little time left to perform in-depth analysis with the 
discipline experts. This report documents the activities completed to date and focuses on 
documenting the tasks of data preparation, integration, text mining, and visualization. Additional 
analysis of the ISS data is recommended and will continue outside this assessment. 

The team completed extraction of pertinent data fields from the six nonconformance data sets 
and installed the merged data on a secure Microsoft® SharePoint® site, with security restrictions 
and controlled access. Colocating the nonconformance data from different reporting systems 
was an important first step in enabling trending analysis and data mining of the nonconformance 
records. The data sets included: 

• Problem reporting and corrective action (PRACA) and items for investigation (IPI) 
data — both included in the ISS Problem Analysis Resolution Tool (PART) 

• Government-furnished equipment (GEE) discrepancy reports (DRs) and GEE PRACA 
from the Quality Assurance Record Center (QARC) 

• Mission Operations Directorate (MOD) Anomaly Reports (ARs) 

• Software Change Requests (SCRs) 

• Maintenance Analysis Data Set (MADS) 

Given the different designs of these data sets, transformation of the data was necessary 

(i.e., storing it in proper format or structure to enable querying and analysis). In some cases, this 
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was as simple as normalizing the names of like fields. In eases where fields were nonexistent for 
one or more of the data sets, this step was more eomplieated. The data sets all have free 
(unstructured) text fields (e.g., title, description) and prescribed (structured) fields (i.e., pull- 
down menus for trend code selection and many other types of selection). The IFIs and ARs, 
however, have few prescribed fields and do not include codes for types of failure modes, defects, 
or causes (e.g., requiring additional steps to improve the search). 

Data- and text-mining approaches were used to enrich the data. These approaches convert 
information in text fields into indexing data or topics. The topics discussed in the text fields 
could then be used to search and filter the data sets, to find similar anomaly reports that might 
otherwise be missed. These topics could also be used to develop topic -based codes for failure 
modes, defects, or other commonly used codes in some of the data sets. 

For data and text mining on individual or merged data sets, the goal was to provide discipline 
experts with better access to pertinent ISS anomaly data by converting topics from free-text 
fields into indexable data. Terms, concepts, and topics identified in text mining would be 
integrated into the merged data set to improve search for relevant reports. 

• Statistical text and data mining would identify terms (often topics) in the text fields 
(e.g., in titles and problem descriptions) that were similar between correlated reports. 

• Semantic text mining would identify concepts (topics) that occurred in text fields and use 
them to index reports in the data set. These topics would be taken from a large set of 
possible topics and would, therefore, be common across data sets. These topics also 
could be used to define standard proxies for trend codes such as failure mode codes. 
Trend codes are used slightly differently across some of the data sets and could be 
applied to all sets, including those where these codes have not been used. 

Significant progress was made in the use of semantic text mining techniques to enrich the data 
and improve capabilities to search and filter reports. Semantic text mining uses a NASA tool, 
the Semantic Text Analysis Tool (STAT), which parses sentences in free text and then matches 
nouns, verbs, and modifiers with concepts (i.e., topics) that are represented in the NASA 
Aerospace Ontology. The ontology is a large hierarchical data structure that is designed to 
recognize multiple words and phrases used in free text in aerospace to denote thousands of types 
of entities, properties, actions, and problems. These concepts are equivalent to a common index 
for all reports in the merged data set. This text-mining approach was used and its accuracy was 
verified in a previous project [ref. 2] on analysis of DRs from QARC. 

The results of the text analysis — a set of topics associated with each data record — were reported 
in formats that were integrated into the merged data set. A method was defined for using these 
topics to expand search to more relevant items, so that fewer of them would be missed in regular 
searches. This method has not yet been rigorously tested. 

The set of topics associated with each data record was used to develop topic -based rules for 
proxy failure mode and defect code fields. This was a second use of the results of semantic text 
analysis. Establishing identical trend code fields across data sets aids standard search. It was 
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also expected that proxy codes would help overcome the manual coding limitation to select only 
one code when multiple codes would be appropriate. GFE PRACA trend codes were chosen as 
the standard codes for all data sets. Several approaches for defining proxy codes were tried, 
including a statistical machine learning approach. Supporting extensions to STAT were 
developed, and additional software was developed for preliminary evaluation of the accuracy of 
the proxy codes during their definition. Proxy codes were delivered for the two PRACA sets: 

IFIs and MOD ARs. 

During this development, cases of wholesale errors in some manual codes were discovered. 

It became clear that the manual codes should have been vetted. Given the low accuracy of some 
manual codes, the statistical machine learning approach, which was used to define rules for 
proxy defect codes, should be rejected until vetting of manual codes results in selection of 
accurate training sets. The extended nature of this work left little time for vetting and evaluation 
of the accuracy or helpfulness of the topics associated with each data record extracted by STAT. 

Two types of tools were customized for searching, browsing, and visualizing the data set to 
provide multiple perspectives on the data, with the goal of supporting further independent 
analysis. Tableau®, a search and data visualization tool for business analytics, was customized to 
provide data team members and discipline experts with interactive dashboards and 
multidimensional report browsers for exploring the merged data. It was demonstrated that 
Tableau® could be used to identify trends in nonconformances across the merged data set. 

Flamenco, an open-source search and visualization tool for multidimensional search, was 
customized (Flamenco-i-) to use the hierarchical indexes provided by STAT and the Aerospace 
Ontology for the data sets. Flamenco-i- was also adapted for evaluating codes and analyzing 
trends. Corresponding STAT adaptations were made to provide output to support use of 
Flamenco-i- for evaluation of proxy codes. The NESC assessment team was not able to fully 
realize strategies for information retrieval based on concept tag indexing and multidimensional 
faceted search using Flamenco. Integrated use of Flamenco-i- and Tableau® was not explored but 
is feasible and promising. 

The SharePoint® site enables discipline experts to go to one location to access the data and to 
then search across the data sets simultaneously. Several topics were investigated in the enriched 
merged data set. They include nonconformances in Extravehicular Mobility Unit (EMU) water 
separator fan bearings and harmonic drive/peristaltic pumps. Initial limited analysis was 
performed for software, human factors, and electrical power systems. SAS® Text Miner was 
used for some analyses to capture topics mentioned in text fields and structured fields, to guide 
search. Slow integration of results of semantic text mining did not leave enough time to define 
and evaluate methods using this topic information. Eessons from exploration of these discipline 
areas have been documented to improve future trending and data mining. 

Eate in the project, a new use of the data by Safety and Mission Assurance (S&MA) personnel 
and other interested organizations was identified. The objective would be to relate anomaly 
record information from the ISS merged anomaly data set to potential risks and hazards defined 
in the ISS Hazard Analysis System. Given a hazard of concern or interest, historical anomalies 
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that have occurred (that may have led to the occurrence of the hazard) and their risk ranking, 
perhaps by the number of related anomaly counts, could be compiled. These incidents may be 
reviewed, counted, and trended to raise awareness and to assess whether preventive actions 
would be prudent. The ability to search across several databases to identify relevant incidents is 
a key attribute to finding a more complete set of incidents for analysis. Further work is needed 
to define the use scenario and to evaluate the usefulness of the tools for this scenario. 

This activity demonstrated use of the tool suite for deep investigations into technical issues 
related to focused problems. The team developed a tool suite framework (i.e., merged and 
enriched data, software, user interfaces, methodologies, processes, and practices) that can inform 
the potential expansion into other program/project data sets and support periodic updates of ISS 
problem-related data for ongoing interactive analyses by Technical Discipline Teams (TDTs). 
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5.0 Assessment Plan 

The objective of this assessment was to utilize data mining and trending techniques to review the 
anomaly history of the ISS and provide tools for discipline experts not involved with the ISS 
Program to search anomaly data to aid in identification of areas that may warrant further 
investigation. A challenge to investigating anomalies is that there are several problem reporting 
systems that hold data of interest, and the reporting systems do not have the same key data fields. 
The assessment team wanted to develop an approach to navigate through multiple problem 
reporting data sets simultaneously. 

The assessment had four high-level goals: 

• Demonstrate the capability to trend anomaly data utilizing multiple data sets. 

• Provide a means for discipline experts to gain deeper insight into ISS anomaly data. 

• Provide fresh insight into ISS problem trends and significant anomalies, as able within 
the assessment timeline. 

• Learn successful approaches to assist discipline experts in trending across multiple, 
merged data sets. 

To accomplish these goals, the assessment team established the following basic approach: 

• Develop a method to capture integrated problem reporting data. 

• Develop a capability to search for problem trends and effectively display meaningful 
trend data. 

• Utilize semantic data mining to provide conceptual indexing and missing failure and 
defect codes. 

• Establish a capability for discipline experts to search ISS data across multiple anomaly 
databases. 

• Identify trends and significant issues from targeted reviews of software, electrical power, 
mechanisms, and human factors disciplines. 

• Document the data mining and trending development effort to inform potential follow-on 
capability for cross-program/project trending. 

6.0 Description of Data Sub -team Tasks 

The NESC assessment team consisted of two subteams — the data subteam and the discipline 
expert subteam. Section 6.0 describes the data subteam’s effort. 

6.1 Team Methodology 

The data subteam prepared the nonconformance data for further analysis, delivered the initial 
analysis, and aided discipline experts with their investigations. The discipline expert subteam 
utilized the initial analysis and data/tools to further investigate for adverse trends or significant 
anomalies. 
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One of the major ambitions of the assessment was to ereate a tool suite that diseipline experts 
eould use to investigate anomaly history and perform data mining aeross multiple ISS anomaly 
databases. The NESC assessment team foresaw many potential uses, sueh as looking at data 
trends across multiple systems, supporting root cause investigations or unique technical 
assessments, or providing supporting data for looking at precursors to failures. 

The NESC assessment team first established a concept of operations for discipline expert use of 
the search tool(s) (see Appendix A). The concept of operations shows how the merged data 
product can be used to serve discipline experts in researching issues concerning ISS anomalies. 
Eour potential discipline expert use cases were identified to support development of the 
enhanced data-mining tool. These use cases are described in further detail in Appendix A. 

• Scenario 1: Identify recurring anomalies and emergent risks. 

• Scenario 2: Provide in-depth problem investigation in support of an NESC assessment. 

• Scenario 3: Associate a potential issue or hazard to the historical operational anomalies or 
failures that could have led to the realization of the hazard. 

• Scenario 4: Provide supporting data for precursor analysis. 

Eate in the assessment timeframe when the tool suite was maturing, the data subteam worked 
with discipline experts in the areas of software, human factors, electrical power, and 
mechanisms. The general interaction with discipline experts is illustrated in Eigure 6.1-1. Once 
the initial set of anomaly data was extracted from the multiple source databases and merged, 
visualization tools were used to build views and dashboards to support the discipline expert 
analyses. This initial set of discipline experts provided feedback for tool enhancements. 



Figure 6.1-1. General Interaction with Discipiine Experts to Support Anaiysis of ISS Anomaiies 
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6.2 Data Sources 

6.2.1 Anomaly and Problem Reporting Data Sources 

The data sources selected for this ISS assessment consisted of GFE DRs and GFE PRACA from 
the QARC, PRACA, and IFI data from the ISS PART, and MOD ARs. Each data source was 
selected by the NESC assessment team with the intent of providing data that would give insight 
into recurring or significant problems. The fields from these databases often did not overlap 
(i.e., freeform fields versus drop-down fields, handling of part numbers, serial numbers, etc.). 
This complicated merging of the data, as it limited which fields were selected for merger and 
drove effort to create new common fields, in some instances using the semantic mining 
techniques described later in this report. For instance, the MOD AR database generally had few 
fields compared with the other databases, and no defect codes or failure codes. 

An additional complication was the manner in which each database handled anomaly 
reoccurrences. This is significant when trending counts of occurrences. MOD AR reoccurrences 
are typically added to an existing record with no indication that there is/is not a reoccurrence, or 
how many — the record must be opened and reviewed. Additionally, in some cases, records such 
as IFIs are upgraded to PART PRACAs and/or GFE PRACAs or DRs. This must be accounted 
for during analysis as well. 

The nonconformance database record counts ranged from 3,992 to 220,006 records per data set. 
One of the main drivers for the differences observed in the counts across databases was the 
manner in which problems are recorded. Flight databases (i.e., PART PRACA/IFI and MOD 
AR) typically only generate a record against the offending part or problem, while the GFE data 
sets often delve deeper into a nonconformance and spawn separate records for the 
subcomponents and/or all serial numbers of an offending part and/or its components. 

6.2.2 Additional Data Sources 

Additional data sources were made available to further support anomaly investigation. These 
included the SCR data and the MADS. The SCR data provided deeper insight into flight 
problems that were transferred there for further troubleshooting or, in some cases, design 
changes. The MADS data were used to gain insight into the hardware that was or had been on 
orbit (see Table 6.2-1). Fields from both were added to Tableau® to provide cross-referencing 
while performing search and visualization. 


Table 6.2-1. Ancillary Data Sources 


Ancillary Data Sources 

Data Sources 

Record Count 

SCR 

40,361 

MADS 

1,921 
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6.3 Data Extract, Transform, and Load (ETL) 

A goal of this assessment was to perform data mining across multiple data sources. To establish 
this capability, data had to be extracted from each data source. The data were transformed into a 
common set of fields and loaded into a single database, enabling data mining and trending. This 
multistep process is referred to as ETL and is shown in Figure 6.3-1. 



Figure 6.3-1. ISS Data Sets Extraction, Transformation, and Load 

6.3.1 Extract 

Data extraction is the act of retrieving data from your desired data sources for further processing 
and subsequent storage. Extracting data from ISS data sources had challenges because of 
differing formats, security access, and understanding how the various fields were used (e.g., 
fields with the same name may have different content, and fields with the same content may have 
different field names). 

Nonconformance records from the GEE DR, PART PRACA, and PART IFI databases were 
extracted using their web interfaces by running a single report that was output in Excel® format. 
Accessing MOD AR data was more challenging because it was accomplished by running reports 
from the database web interface for each ISS increment and then exporting the individual 
nonconformances in the increment report to an Excel® file. Each Excel® file was then combined 
into a single file. Access and extraction of data from the data sources required contacting the 
data owners, requesting access to the data, and meeting the owner’s security requirements. 

The data extracted from the data sources were static, so the data were current only from the day 
the data were retrieved. Table 6.3. 1-1 shows the data extraction date for each record system. 
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Table 6.3. 1-1. Data Extraction Date for Each Record System 


Data Set 

Extraction Date 

GFE DR 

September 24, 2014 

GEE PRACA 

June 24, 2014 

PART PRACA/IEI 

January 7, 2015 

MOD AR 

January 7, 2015 

MADs 

June 30, 2014 

SCRs 

July 31, 2014 


The extraction of GFE PRACA was performed using a standalone Microsoft® 2008 Server and 
by building a Microsoft® SQL 8 database. The data set had Shuttle data and crossover data 
(i.e., both ISS and Shuttle), so a Microsoft® query was built to extract only ISS data. The queries 
were improved as the NESC assessment team vetted the data. Eor example, some adjustments 
were needed when it was noticed that not all of the extravehicular activity (EVA) data were 
retrieved in the initial queries. This was found during early analysis and corrected. 

SAS® Enterprise Guide was used to review and set up the large data sets that combined 
visualization and search in Tableau®. Tableau® visualization was used for early data quality 
control. Data discrepancies were easier to find using visualization. 

Data owners were instrumental in providing road maps to the data and providing the 
documentation required to help the team make decisions on which fields to use. They provided 
data code manuals, reports, supporting documentation, and data dictionaries. 

The initial extraction included 353 fields from five different problem reporting data sources. 
After review by the data subteam, the number of fields was reduced to 209 fields. The data 
sub team further consolidated those into 36 fields. This review identified fields required to 
combine five different ISS problem reporting data sources into one source. Many of the 
discarded fields were system-generated fields that controlled the document status or the date and 
time transaction. Additionally, many of the excluded fields were specific to processing data 
within that data source, as in a document workflow. 

There are two distinct field types: structured fields and unstructured fields. Structured fields 
have predetermined options available for selection (e.g., codes and code descriptions). Usually, 
these are in dropdown menus that a user has to select. Unstructured data accept freeform data, 
with little or no organization. For example, a field entitled “problem description” typically 
allows freeform entry of a prescribed amount of characters. These free text fields caused 
challenges for data mining, due to spelling errors, acronyms, special characters, and other text 
irregularities. There were four unstructured fields used in the combined data set: Problem Title, 
Problem Description, Detected During, and Part Description, as shown in Table 6.3. 1-2. This 
table also lists the structured fields used to separate problem reporting documentation into 
categories that could be searched for trending and analysis. 
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Table 6.3, 1-2. Structured and Unstructured Fields Used in Merged Data Set 


Reporting Codes (Structured Fields) 

• 

Program 

• 

Subsystem • Defect 

• 

Project 

• 

Flight Element • Failure Mode 

• 

Cause 

• 

System • Prevailing Condition 

• 

Disposition 

• 

Test Operation • Recurrence Control 

Unstructured Fields 

• 

Problem Title 

• 

Problem Description • Detected During 

• 

Part Description 




Even though many fields were not used in the merged data set, links were provided (in Tableau®) 
to the original data source and added to each record to provide for a more in-depth analysis of 
individual records, if necessary. Each complete record could be viewed by following links to the 
original data source web site. 

6.3.1. 1 Extraction of Additional Data Sources Fields 

Two additional data sources were used to further support nonconformance data analysis, the SCR 
and the MADS (see Table 6.3. 1-3) data sets. SCRs document software updates (which are 
sometimes kicked off via nonconformances) and MADS are used for capturing hardware 
maintenance activities. The MADS and SCR data sources were then blended with the related 
problem reports. Blending did not result in adding fields to the merged data set, but supported 
looking up related details. Eor example, a problem report may refer to a part number that could 
then be examined further by searching the MADS data. 
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Table 6.3.1-3. SCR and MADS Data Fields 



SCR Data Fields 

MADS Data Fields 

• 

Reason for Change 

• Part Number 

• 

Subsystem 

• Location 

• 

Test Environment 

• Flight Activated 

• 

ISS SCR Number 

• Unique ID 

• 

Status 

• Old Part Number 

• 

Provider 

• Part Name 

• 

Originator Stage 

• Hardware Criticality 

• 

CSCI 

• Flight Manifested 

• 

Created Date 

• Type Name 

• 

Title 

• System 

• 

Board 

• Function 

• EVA or IV A Overhead 
Time 

• Type of Part 


6.3.2 Transform 

6.3.2.1 Data Source Fields Transformed 

The review of the five data sourees consolidated 353 fields into 36 transformed fields. 

These fields were chosen based on the relevance of the data for trending and subsequent insight 
into trends and significant problems. Where there were different field names with the same data 
types, those field names were transformed as shown in the example in Figure 6. 3. 2-1. 


Report Number 




Figure 6.3.2-1. Transformed Fields Examples 
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A new field, Database Name, was added to help faeilitate reeord integrity where reeords from 
two data sourees had the same data identifiers but had no relationship, as in Reeord Numbers 
with PART IFI and MOD AR. Figure 6. 3. 2-2 shows the methodology that was used to maintain 
record integrity when combining those data. 


PART IFI 
Data Source 

MOD AR 
Data Source 


Record Number 

631 

/ 

\ 

631 


Identical Add New Field 


Database Name 
PART IFI 
/ 

\ 

MOD AR 


631 

631 


PART I FI 
MOD AR 


Frayed Retractable Tether Cord (EVA) 
Vozdukh Vacuum Valve 1 Fall 


Figure 6.S.2-2. Field Addition for Record Integrity Example 

The full list of transformed fields is shown in Table 6. 3. 2-1, including three added fields 
(Sub Ontologies, CTags (concept tags), and CTag Count), which are explained in Section 
6.4.3. 1. 


Tabie 6.3.2- 1. Transformed Fieids 


• 

Record Number 

• System Code 

• 

Project Code 

• 

Originator 

• Site Location 

• 

Problem Title 

• 

Status 

• Hardware Type 

• 

Cause Code 

• 

Program Code 

• Elight 

• 

Cause Description 

• 

Detected Date 

• Defect Code 

• 

Like HW On Orbit 

• 

Detected During 

• Defect Description 

• 

Part Number 

• 

Disposition Code 

• Eailure Mode Code 

• 

Part Description 

• 

Manufacturer 

• Eailure Mode Description 

• 

Serial Number Lot 

• 

Prevailing Condition 
Code 

• Responsible Org 

• 

Database Name 

• 

Recurrence Control Code 

• Activity 

• 

Related Document 

• 

Test Operation Code 

• Hardware Ownership 

• 

Subsystem Code 

• 

Flight Element Code 

• Problem Description 

• 

Subsystem Description 

• 

.3.3 

Sub Ontologies 

Load 

• CTags 

• 

CTag Count 


6.3.3.1 SAS® Data Load for Data Visualization 

The completion of the transformed and combined fields brought the NESC assessment team to 
the next phase, which involved converting the data into a file that the Tableau® Desktop software 
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could import. Because a standalone version of Tableau® was utilized, a Microsoft® Excel® file 
was needed to make the data portable for the Tableau® Reader. This allowed the Tableau® file to 
be downloaded to any desktop or laptop to review the entire transformed database. The software 
used for the data conversion to Microsoft® Excel® was SAS® Enterprise Guide (EG). This was 
the same software that was used for the transformation phase of ETE. Workflows were set up 
using EG so that new data could be added or modified as needed. SAS® EG was used during 
data refresh to add deseriptions to field coding (i.e., cause, defeet, failure, subsystem 
descriptions), when data were updated, and during vetting. 

6.4 Tool Suite 

There is no “perfect” tool for identifying trends or significant anomalies. Overlapping 
techniques are necessary to improve results. Overlapping techniques are useful when working 
with the noneonformanee data sets to rule out irrelevant reports, remove duplieates, eorroborate 
relevant reports, and identify reports that were expected but not found. The resulting data set ean 
then be counted and presented in time -related trends. 

The data subteam’s approach was to utilize a merged data set and apply data-mining tools and 
techniques to enhance the ability to identify trends and significant anomalies by applying a suite 
of capabilities. The methods used to explore nonconformances included (1) search, (2) improved 
seareh by way of adding eoneepts to anomaly reports (eoncept tags), and (3) adding failure mode 
and defeet eode fields using “proxy codes” to nonconformance data sets that did not have them 
(i.e., MOD AR and PART lEI). Several tools were used for searching and visualizing the data: 
Tableau®, Elamenco, and SAS®. Statistical text mining using SAS® identified correlated 
doeuments, based on terms they have in eommon, to find reports that may be missed using full 
text seareh. SAS® was also used to update the Aerospaee Ontology, whieh was used in 
conjunction with the STAT to develop the concept tags and proxy codes. Elamenco, enhanced to 
beeome Elamenco-i-, was used for its strength as an open-source faceted search and visualization 
tool. Tableau® was used for its strength as an intuitive, state-of-the-art data visualization tool. 

6.4.1 Search 

Eull text seareh is a common information retrieval method when key information for selecting 
reports is in text fields. Common search strategies are iterative and interactive to give the user an 
opportunity to improve the seareh query until the sought-for item is found. Using this strategy 
with ISS anomaly data sets is useful, yet insufficient by itself; it is relatively easy to judge 
whether a report is relevant, but finding the right reports is difficult. 

The most common reasons for failing to retrieve reports with seareh are word variations, which 
include synonyms, multiple spellings and misspellings, abbreviations, acronyms, and other 
shortened forms. An automatie query reformulation or seareh expansion strategy eould help 
overeome the problem of word variations if these variations ean be eolleeted from the text in the 
data set. STAT and SAS® also provided spelling correction and stemming to base forms (e.g., 
“closing” changed to “close”). This collection strategy was used early in the development of the 
merged data sets prior to the utilization of data-mining tools. Simply using seareh on merged 
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data sets added value compared with searching nonconformance databases separately. The same 
results could have been achieved by combining these search results using the latter approach; 
however, that approach would have been considerably more cumbersome and time consuming. 

6.4.2 Data Mining to Enhance Search 

Figure 6.4.2- 1 shows the activities the NESC data subteam performed to enhance the 
nonconformance reports by adding proxy defect and failure codes and “concept tags.” It shows 
the stages of transformation from the original data sources to the final merged data views, 
including enhanced search, visualized using Tableau® and Flamenco-i-. Some data sources 
(e.g., GFE PRACA and PART PRACA) had problem reporting codes (e.g., failure mode codes 
and defect codes) that could be selected from pull-down lists. PART IFI and MOD AR data 
sources did not have failure mode or defect codes. These data fields were created for PART IFI 
and MOD ARs using proxy codes, which enable searching with these codes simultaneously 
across all data sets. 



need proxy codes 


Initial 

merged 

data 


Merge data 


Mine text 
description 
fields 


Original data 
sources _ 
(PART-PRACA, 
PART-1 FI, etc.) 


Concept tags.>^^^ 
es 


Merged data 


Aerospace 

Ontology 

STAT 


Search and Views 
for Experts 

• Tableau 

• Flamenco+ 


Figure 6.4.2-1. View of NESC Data Subteam Activities 


The Aerospace Ontology and STAT were used to develop concept (topic) tags. The concept tags 
were intended to enrich each anomaly report by adding relevant concepts or topics to individual 
nonconformance reports, improving the ability to group nonconformances when searching. The 
concept tags are assigned based on analysis of the text from unstructured (i.e., free-text) fields: 
the Problem Title and Problem Description fields. Likewise, rules for assigning proxy codes 
were developed using the concept tags. The concept tags and proxy codes were added to the 
merged data set and used directly in the data views in Flamenco and Tableau®. 
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6.4.2.1 Aerospace Ontology, Concept Tagging, and Proxy Codes 

6.4.2. 1.1 Concept Tagging 

Semantic text mining with STAT and the Aerospace Ontology identifies and tags reports with 
the concept-topics that are mentioned in the Problem Title and Problem Description text fields of 
each report. The goal of concept tagging is to provide discipline experts with better access to 
pertinent ISS anomaly data by extracting concept-topics from free-text fields so they can be used 
to index, search, and filter reports in the merged data set. 

The Aerospace Ontology concept-topics are equivalent to a common index for all reports in the 
merged data set. Each concept-topic in the Aerospace Ontology is associated with a list of terms 
(words and phrases) and variants that represent that concept so that it can be matched with 
nonconformance free-text fields. These indexing concepts are robust to many variations in the 
way topics are expressed in text. The Aerospace Ontology contains thousands of indexing 
concepts and tens of thousands of terms, which have been developed over years of effort, most 
recently with GEE nonconformance records (i.e., DRs). The structure of concepts in the 
Aerospace Ontology is hierarchical and is organized at the top level into sub -ontologies for types 
of properties, objects, actions, and problems in the aerospace domain. 

Prior to using the Aerospace Ontology to develop concept tags, concepts and terms were added 
to the Aerospace Ontology for the ISS nonconformance domain. Methods were developed and 
used successfully to semiautomatically identify new terms and variants (from the merged data 
set) to add to the ontology. Eexical analysis of the vocabulary in the merged data set, described 
in Appendix B, identified about 170,000 words and phrases to consider. A matching and 
frequency-ranking method, described in Appendix C, identified a set of less than 350 new terms 
that had priority to be added to the Aerospace Ontology. The version of the Aerospace Ontology 
that was used for indexing by text mining included these terms, as well as others identified 
during preliminary vetting of proxy codes. A spreadsheet-based procedure for adding new 
concepts and terms to the Aerospace Ontology is described in Appendix D. 

Semantic text mining with STAT identifies and tags reports with the Aerospace Ontology 
concept-topics. STAT performs spelling correction and parses the content of the text fields to 
derive syntactic phrase structures with nouns, verbs, and associated modifiers. STAT then finds 
semantic (meaning) matches to concept-topics, based on lists of words or phrases that are 
associated with each concept. These matches are used to identify types of problems, objects, and 
properties in the text. One or more problem, object, or property concept-topics can tag each text 
field in each anomaly report. The results of the text analysis — a set of concept-topics associated 
with text fields in each data record — are output in table formats that were integrated into the 
merged data set. 

STAT matches and indexes the Aerospace Ontology words and phrases by using the stemmed 
base forms of discrepancy words. This simplifies the matching to search for BAD or NO nouns 
or verbs. For example, “inadvertently closed” would be simplified to “BAD close.” Near 
matches such as “incompletely closed” would also be a type of “BAD close.” The phrase 
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structures of each sentence guide the assoeiation of words within the phrases, such as in a case 
where there are intervening words between “inadvertent” and “close.” This simplifying strategy 
improves performance but can merge types of bad properties that need to be distinct. The 
resulting concept tag distinguishes the type of operation/function better than the type of problem 
property. This weakness can be remedied in the future by text analysis changes or by using a 
negative property dimension in faeeted search. 

In practice, STAT does not tag all of the ontology coneepts in the text. Some concepts are too 
general. Others are unlikely to be of interest to the analyst. The configuration specifies a set of 
intermediate-level concepts (the “start-with-nodes”). STAT tags these concepts and the concepts 
below them. 

This text-mining approach was used and its accuracy verified in a previous project on analysis 
and text mining of DRs from QARC. For more detail, see reference 2. 

6.4.2.1.2 Development of Proxy Codes 

STAT and the Aerospace Ontology were also used to develop proxy codes. The set of 
Aerospace Ontology concept-topic tags associated with each data record was used to develop 
concept-based rules for proxy failure mode and defect code fields. The proxy eodes provide 
substitute failure mode and defeet codes in the MOD AR and PART IFI data sets, where manual 
problem reporting codes are not built in. Analysts can search for similar records using these 
codes simultaneously across data sets. GFE PRACA trend codes were chosen as the standard 
codes for all data sets. These were ehosen rather than the PART PRACA trend codes because 
there were fewer possible GFE PRACA trend codes, and they were more recent versions of the 
failure mode and defect codes. 

Synthetic codes would serve as proxies for the missing manual codes. A plausible approach to 
generating these proxy codes was to define classification rules, using “and/or/nof ’ logic based on 
the presence or absence of specified concept tags. The concept tags could be used as inputs to 
the rules. These rules would classify each trend code in a record into one or more proxy code 
values. Allowing more than one proxy code value per trend code could be useful in overcoming 
the problem of constraining manual codes to only one code per field when two or more codes 
would have improved the search for trends. 

Several approaches for defining proxy codes were tried, including a statistical machine learning 
approach for defect codes. Proxy codes were assigned based on concept tags from the text in the 
Problem Title or the Problem Description field. Preliminary estimates of proxy code recall 
(i.e., the proportion of records with a particular GEE PRACA manual code found with the 
corresponding proxy code assigned) were about 30 pereent. This rate is similar to the estimated 
likely manual recall (if assessed by trained judges, allowing multiple code values). The highest 
precision (i.e., proportion of assigned codes that matched a particular GFE PRACA manual 
code) for defeet proxy codes was 0.27 (Mean = 0.10) and for failure mode proxy codes was 
0.84 (Mean = 0.16). To improve precision, records with more than five proxy code assignments 
were reduced to the five codes with the highest preeision in the initial measure of proxy code 
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precision. More detailed descriptions of methods for proxy code development and refinement 
are provided in Appendix A (Section A. 3. 3) and Appendix F. 

The inherent limitations and inaccuracy of the manual trend codes made it difficult to develop 
accurate classification rules for the proxy codes. These limitations and possible remedies are 
discussed further in Appendix F (Section F.5). 

6.4.2.2 Search Using Concept Tags 

Rather than building proxy codes from concept tags, the more promising approach is to use 
concept tags directly to search and browse. The concept tags were concatenated into a single 
string and made into a concept tag data field in Tableau® so that Tableau® users could search 
with Aerospace Ontology concepts to find anomaly records with similar attributes. The 
Tableau® visualization tool (see Section 6.4.3. 1) can present multiple dimensions but does not 
currently support hierarchical faceted search as seen in Flamenco+. Tableau® performance 
problems have prevented full use of this search strategy. 

The concept tags that are extracted from text fields are also a source of dimensions for faceted 
search. Faceted search combines keyword search with browsing in a multidimensional 
(i.e., “multifaceted”) hierarchical space. Analysts can begin with a classic keyword search and 
then scan the list of results while inspecting a display of related dimensions that provides insights 
into the content and its organization. The purpose of faceted search is to help the analyst 
determine quickly what types of attributes or dimensions are available and the counts of reports 
that contain concepts in those dimensions (see Section 6. 4. 3. 2). The dimensions partition the 
items in multiple ways so that each anomaly report can be a member of several different groups 
of related reports. Combinations of dimensions and search within groups can filter sets of related 
reports into more specific subsets that target the trends of interest to the analyst. 

Concept tags were implemented near the end of the assessment and not utilized enough to fully 
test their efficacy. They are incorporated in Flamenco+ and Tableau and at the very least will 
improve the ability to perform deep dives on particular topics where a search needs to be as 
comprehensive as possible. Given that, it is also expected that the concept tags will help 
improve the overall speed and accuracy of performing searches in general. 

6.4.2.3 Search Using Proxy Codes 

The purpose of proxy codes is to approximate what would have been assigned by a manual entry 
in the data sources where manual problem reporting codes were not used (i.e., MOD AR and 
PART IFI). If all merged data records included these codes, similar records from all sources 
could be retrieved in a similar manner. Data fields needing proxy codes were identified 
(i.e., failure mode codes and defect codes). STAT and the Aerospace Ontology were used to 
match concept (topic) tags with text in the title and description fields. The project used the four 
data sets to develop and evaluate proxy codes. The inherent limitations of the original codings 
(see Section 6.4.2. 1) limit their usefulness for data discovery. This was found to be true. The 
limitations primarily stemmed from the inadequacy of the existing manual condition codes found 
in the existing data sets (i.e., GFE and PART PRACA). Significant manual coding errors were 
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found during the process of developing the proxy code rules. Even though the team attempted to 
overcome this, preliminary estimates of proxy code recall (i.e., the proportion of records with a 
particular GEE PRACA manual code found with the corresponding proxy code assigned) were 
about 30 percent. 

A more detailed description of methods for proxy code development and refinement is presented 
in Appendix A (Section A. 3. 3) and Appendix E. 

6A.2A Statistical Text Mining Using SAS 

The purpose of the SAS® analysis text-mining phase was to identify reports for specific 
suspected problem areas, disciplines, or subsystems that could not be found easily with keyword 
search. Discipline experts specified lists of terms and noun groups that defined areas of focus. 
Statistical text mining was used to identify correlated documents, based on terms and noun 
groups they had in common. Each group of correlated documents represents a latent topic, 
which is defined by the common terms. Thus, new terms or noun groups could be identified to 
add to search expressions, if desired. The analysis was used to determine significant 
observations or trends that needed further investigation. Eor detailed information on SAS® 
analysis, see Appendix G. This approach proved useful for identifying potential areas of interest 
based by grouping similar anomaly topics. Since the methodology used in SAS is statistical 
based on word frequency, many of the clusters of anomalies identified turned out to be 
uninteresting. Consequently, wading through identified clusters is time-consuming and often 
uninformative. 

6.4.2.4.1 SAS® 

SAS® advanced analytics software packages used in this assessment were SAS® Enterprise 
Miner, SAS® Text Miner, and SAS® Enterprise Guide. SAS® Enterprise Guide was used to 
combine and transform the five different data sources, as discussed in Section 6. 3. 3.1. These 
software products were also used during the analysis phase for lexical analysis and to perform 
text mining to identify topics that could be used to find relevant reports that might be missed in 
search. 

6.4.3 Data Visualization 

A key enabler of data trend analysis is to have an effective tool for users to query the data and to 
visualize the output. This assessment used two complementary data query and visualization 
tools: Tableau® and Elamenco-i-. 

6.4.3.1 Tableau® 

Tableau®, a commercial off-the-shelf (COTS) tool, was used for its strength as an intuitive, state- 
of-the-art data visualization tool. Tableau® Desktop is a multi-platform software program 
procured to assist the NESC assessment data team developers implement data visualization. 
Tableau® Reader is freeware used to connect to the data sources (i.e., merged data sets), which 
were built using Tableau® Desktop. Tableau® Reader was used by the discipline experts and the 
data subteams. Tableau® Desktop provided the capability for querying, calculating, code 
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generating, and graph building for the eonstruetion of data visualization dashboards, saved as 
Tableau® files. The Tableau® Reader is used to interact with these files, providing a viewing 
capability, querying, filtering, sorting, exporting, and printing. This facilitates the interactive 
visualization of the files produced by the Tableau® Desktop component. 

The data visualization dashboards were designed and developed to provide quick access to the 
multidimensional aspects of the information contained in the nonconformance reports. The 
dashboard shown in Figure 6.4.3- 1 depicts six zones of interest: one primary query zone and five 
display zones. Following the numbering on the figure. Zone 1 is a text entry area used to query 
the combined data sets. Zone 2 summarizes record counts over a trending timeframe (a record is 
a single nonconformance record, e.g., PART record 9202) showing occurrences detected per year 
and total records per database. Zone 3 is the records table, which includes title, description, and 
link to the original record database. Zone 4 contains various other counts, such as a count by 
part number and a count by cause codes. Zone 5 shows records related to the currently selected 
record, as well as an ability to filter records by cause, defect, or failure mode. Zone 6 contains 
the concept tags and includes a text entry area to filter the concept tags down to those tags 
containing the entered text, and additionally filters all other zones on the dashboard 
simultaneously. See Appendix E for more details on the zones in the figure and for further 
explanation of the use of Tableau®. The user manual in Section E. 1 of Appendix E details 
additional Tableau® dashboard functionality. 

The merged data set provides the ability to trend across a broader data set, and Tableau® makes it 
straightforward and intuitive to view the data. There is overlap between the data sets that often 
skews the counts, however. This adds burden to the user to manually remove duplicates once 
identified. Eor instance, at times a nonconformance identified in the MOD AR data set results in 
a nonconformance in the PART lEI data set, which may then end up in one of the GEE data sets. 

Tableau® has proven useful for search and discovery. It is also valuable for exporting 
data/information to other tools such as Microsoft® Excel®, where additional cleanup, reduction, 
or formatting can be performed. 
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Figure 6.4.3-1, Data Visualization Dashboard 
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6.4.3.2 Flamenco 

Flamenco, enhanced to become Flamenco+, was used for its strength as an open-source faceted 
search and visualization tool. “The faceted search model leverages metadata fields and values to 
provide users with visible options for clarifying and refining queries. It features an integrated, 
incremental search and browse experience that lets users begin with a classic keyword search and 
then scan a list of results (or do additional search). It also serves up a custom map (usually to the 
left of results) that provides insights into the content and its organization and offers a variety of 
useful next steps. That’s where faceted navigation proves its power. In keeping with the 
principles of progressive disclosure and incremental construction, users can formulate the 
equivalent of a sophisticated Boolean query by taking a series of small, simple steps. Faceted 
navigation addresses the universal need to narrow a search. Consequently, this pattern has 
become nearly ubiquitous in e-commerce, given the availability of structured metadata and the 
clear business value of improving product find-ability” [ref. 3]. 

The Flamenco-i- faceted search environment is customized to show concept facets in the area to 
the left of the results of a faceted search. The search for “joint” in Figure 6. 4. 3-2 identifies 
46 reports where the word “joint” appears in the text, and two “joint” (as a noun) concept-topics, 
one from the Title field and one from the Problem Description field in GFE PR AC A records. 
Clicking on these links will lead directly to the set of reports that are tagged with this Joint 
concept tag. This will identify reports where the word “joint” or one of its 19 variants appears in 
the text. The variants include such terms as “SARI,” “slip joint,” “join,” and “coupling.” 



Figure 6.4. 3-2. Results of a FIamenco+ Keyword Search for “Joint” 
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The facets on the left of the figure provide a custom map of the concepts associated with the 
46 reports retrieved by keyword search, in order of frequency. Under each facet can be seen 
other classifications that are associated with “joint”; in the “Title Tags: Nouns” facet, the user 
can see that “joint” cross-cuts many concepts (e.g., “equipment part,” “physical interface 
component,” “energy or power”). From here, the user may want to select “equipment part” 
under “Title tags,” the most frequent category, to refine the query. Alternatively, the user can 
choose to perform another search (e.g., for “locking”) to refine the 46 results further. This 
scenario is discussed in detail in Appendix E. 

The facets were designed to navigate by selected concept dimensions (from the Aerospace 
Ontology) or by type of code (failure mode and defect code). In this design, there are six ways 
to browse or filter based on the type of concept tags in various text fields (i.e., title or description 
field X object/noun, property, or problem). Six more facets support vetting of proxy codes 
(i.e., title or description field x manual or proxy x failure mode or defect code). These facets are 
illustrated in Appendix E, Eigure E-14. Many other facet designs are possible for the ISS 
anomaly data set. 

Due to the late incorporation of proxy tags, Elamenco-i- was not utilized for search during the 
assessment but is available for use going forward. Elamenco should be optimized for search 
using concept tags. 

6.5 Products Used, Purchased, and/or Developed 

6.5.1 Data Sets and Data Set Documentation 

6.5.1.1 ISS Anomaly Data Sets 

The final ISS anomaly data set included: 

• Combined anomaly records, as depicted in Eigure 6.3-1. 

o Including Eailure and Defect proxy codes that were added to records originally 
without these codes, 
o Including concept tags. 

6.5.1.2 Aerospace Ontology Data 

SEAT semantic annotation or “tagging” relates parts of the text to concepts in the Aerospace 
Ontology, a lexicalized ontology. In a lexicalized ontology, each concept is associated with a list 
of words or phrases that are possible text representations of the concept. The Aerospace 
Ontology is implemented in Protege. The final Aerospace Ontology version that supported 
STAT processing and delivery of concept tags and proxy codes is AO 1.31 (.owl) and Version 
1.31 Aerospace Ontology (.xml). Versions of the Aerospace Ontology that were developed 
during this project (in both .owl and .xml formats), in addition to V 1.31, are only available upon 
request. Please contact the NESC at NESC@nasa.gov. 
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6.5.1.3 STAT Text Mining Result Data 

6.5. 1.3.1 Concept-topic Tags 

These tags are available upon request. Please contact the NESC at NESC@nasa.gov. 

6.5. 1.3.2 Proxy Codes 

These codes are available upon request. Please contact the NESC at NESC@nasa.gov. 

6.5.2 Software and Software Reference Documentation 

The following items were purchased, or downloaded as open source: Protege, SAS®, Tableau®, 
and Elamenco. 

6.5.2.1 Data and Text Mining Software 

Protege: Protege is open-source software for editing ontologies and building intelligent 
systems. The software (V4.3) can be downloaded at 
http ://protege . Stanford, edu/products .php#desktop-protege . 

Plugins for spreadsheet-based updating, XME output, and acronym checking are available upon 
request. Please contact the NESC at NESC@nasa.gov. 

SAS® Software Tools 

The following SAS® software tools were purchases under this assessment: 

SAS® Analytics Pro v9.4: SAS® Analytics Pro 9.4 is the foundation of Base SAS® that houses 
the SAS® (data management facility, programing language, data analysis, and reporting) database 
and programs (Enterprise Guide, Enterprise Miner, and Text Miner). 

SAS® EG v6.1: This point- and-click interface generates code to manipulate data or perform 
analysis automatically and does not require SAS® programming experience to use. SAS® EG 
provided the functionality that allowed us to perform ETL functions of the data into a 
homogeneous data structure. Because the data resided in many different heterogeneous 
databases and formats, SAS® EG helped to facilitate the extraction of data from many Microsoft® 
Excel® files and transform these data into a more homogeneous data structure, and to load export 
data into Tableau® readable files for Tableau® visualizations. SAS® EG also provided the path to 
update the files from the data sources early in the performed workflow process by putting into 
place several parameters. This expedited the entire process. 

SAS® Enterprise Miner vl3.1 : SAS® Enterprise Miner streamlines the data-mining process to 
create predictive and descriptive models based on analysis of vast amounts of data. Enterprise 
Miner and Text Miner provided capabilities to explore and discover information found in the 
many textual data fields. This enabled the consolidation of the information into concepts and 
clusters. 

SAS® Text Miner vl3.1: SAS® Text Miner tools enable information extraction from a collection 
of text documents to uncover the themes and concepts. 
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STAT Semantic Text Analysis Tool 

STAT is a syntactic parser and semantic interpreter and tagger implemented in Perl and Lisp that 
uses flat files as input and output. A STAT tar file is available upon request. Please contact the 
NESC at NESC@nasa.gov. 

Ontology Updating Software 

The Guide for updating the Aerospace Ontology based on lexical corpus analysis is contained in 
Appendix C of this report. 

The Python software for performing this updating is available upon request. Please contact the 
NESC at NESC@nasa.gov. 

Proxy Code Development and Evaluation Software 

Python scripts were developed to export Elamenco+ concept tags, generate proxy code rules, and 
evaluate their precision and recall. This software is available upon request. Please contact the 
NESC at NESC@nasa.gov. 

6.5.2.2 User interface and Visualization Software 

Tableau® Software: Tableau® Desktop is a multi-platform, COTS software program procured to 
assist the NESC assessment team developers in implementing data visualization. 

Tableau® Reader: Tableau® Reader is freeware used to connect to data sources (merged data 
sets) that were built using Tableau® Desktop. 

Flamenco+: Elamenco is a search interface framework implemented in Python using a MySQE 
database and is available at http://flamenco.berkeley.edu/index.html. 

Elamenco-i- was developed to enhance the user interface and output capabilities for searching and 
browsing problem reports and other NASA short documents. A Elamenco-i- tar file is available 
upon request. Please contact the NESC at NESC@nasa.gov. 

6.5.3 Guides and Training Products 

6.5.3.1 User Guides 

6.5.3. 1.1 Flamenco+ User Guide and Tutorial 

A Elamenco-i- User Guide and Tutorial is available upon request. Please contact the NESC at 
NESC@nasa.gov. 

6.5.3. 1.2 Tableau® Dashboard Tutorial 

The Tableau® tutorial is contained in Appendix E, Section E.l, of this report. 

6.5. 3. 1.3 Data Mining Site Users Guide 

The “ISS Data Mining Site Construction Guide” is contained in Appendix H of this report. 
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6.S.3.2 Developer Guides 

6.5.3.2.1 Ontology Customization Guide 

The Ontology Customization Guide is contained in Appendix D of this report. 

Previously developed user guides for inspecting and updating the Aerospace Ontology are 
available upon request. Please contact the NESC at NESC@nasa.gov. 

6.5. 3.2. 2 STAY Analysis Tutorial and User Guide 

A SEAT Analysis Tutorial and User Guide are available upon request. Please contact the 
NESC at NESC@nasa.gov. 

6.5. 3.2. 3 Flamenco+ Setup Guide 

A previously developed guide to setting up Elamenco+ is available upon request. Please contact 
the NESC at NESC@nasa.gov. 

7.0 Analysis Results 

7.1 Results of Discipline Analysis 

The NESC assessment data team performed initial search and analysis for several systems as 
requested by a subset of the discipline experts. Initial search and analysis means that the team 
applied the data tools to the data sets for specific ISS subsystems or problem sets and extracted 
what appeared to be trends and/or significant anomalies. Determinations of significance are left 
to the discipline experts. The trends may or may not be significant, and the anomalies may be 
significant but may turn out to be well understood and previously dispositioned. 

The following ISS subsystems (or discipline areas) had some initial search and analysis 
performed: Environmental Control and Eife Support Systems (ECESS), mechanisms, software, 
electrical power, and human factors. Brief summaries of each are provided below. Some of the 
search and analysis was performed broadly using standard and enhanced search techniques, 
where the focus was not necessarily to capture every nonconformance related to a particular 
issue. In other cases, the NESC assessment team was asked to examine a specific issue and 
performed a deeper dive (i.e., a more exhaustive search for focused areas), such as ECESS and 
mechanisms. Eor these cases, search, enhanced search, and statistical text mining were used. 

EMU: The NESC assessment team was asked to search for anomalies related to the EMU fan 
bearings, meaning any nonconformances against the fan, pump, and separator bearings. Dating 
back to 1979, 44 related nonconformances were identified in the GEE PRACA, MOD AR, and 
PART IFI databases. Seven were identified as “possibly” interesting, 26 as “probably” 
interesting, 10 as “definitely” interesting, and 1 as not interesting. This deep dive utilized SAS® 
standard and enhanced search to improve the likelihood that all related anomalies were 
identified. The data are available upon request. Please contact the NESC at NESC@nasa.gov. 
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Mechanisms: The NESC assessment team was asked to search for anomalies related to 
peristaltic and harmonic drive pumps. Tableau® was used to perform search and the information 
was provided to the mechanisms discipline experts. 

Software: The NESC assessment team worked with the NASA Technical Eellow for Software to 
identify trends related to software anomalies. The NASA Technical Eellow for Software was 
looking for supporting data to define the state of the discipline across the Agency. Eor example, 
trends as seen in Eigure 7.1-1 were provided for use. This figure identifies failures related to ISS 
computers over the 5-year period from 2009 through 2014. Additional information on software 
failures can be obtained by contacting the NESC at NESC@nasa.gov. 



Figure 7.1-1. Trends of “ISS Computers” Failures from 2009 to 2014 

Human Factors: The NESC assessment data team also worked with the NASA Technical 
Eellow for Human Eactors and the Human Eactors TDT Deputy to provide high-level trends for 
consideration. Eor instance, some of the identified trends indicated areas where astronauts are 
doing repeated work. These might benefit from improvements in processes instead of technical 
fixes. Another example can be seen in Eigure 7.1-2. Nonconformances with either “smoke” or 
“fire” and “alarm” show an increasing trend both over an 1 1 -year and a 5 -year period using a 
quadratic trend curve fit, as seen in Eigure 7.1-2. The human factors team may consider whether 
the recent uptick is significant and whether any actions are warranted. 
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Figure 7.1-2. Nonconformances Containing “Smoke” or “Fire” and “Aiarm” 

Electrical Power: SAS® data and text mining tools were utilized to begin investigating failure 
trends in eleetrieal power. The eleetrical power subsystem was a test ease for developing a 
process using SAS® text mining that might be applied to other ISS subsystems analysis. This 
effort was not completed and may or may not prove beneficial. Additional explanation is 
provided in Appendix G. Results are not ready to be reported at this time. 

Tool Suite Results: Simply using search on merged data sets added value compared with 
searching nonconformance databases separately. The same results could have been achieved by 
combining these search results using the latter approach; however, that would have been 
considerably more cumbersome and time consuming. 

The merged data set improves the ability to trend across the broader data set, and Tableau® 
makes it straightforward and intuitive to view and parse the data. This allows the users to 
investigate counts and trends, as well as perform data exploration. However, some overlap 
between the data sets often skews the counts. For instance, at times a nonconformance identified 
in the MOD AR data set results in a nonconformance in the PART IFI data set, which may then 
end up in one of the GFE data sets. This adds burden to the user to manually remove duplicates, 
once identified. 
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Tableau® has proven useful for search and discovery. It is also valuable for exporting 
data/information to other tools, such as Excel®. 

7.2 Data Enrichment Results 

Concept tags were added to the merged data set near the end of the assessment and were not 
utilized enough to test their efficacy. They are incorporated in Flamenco+ and Tableau® and, at 
the very least, will improve the ability to perform deep dives on particular topics where a search 
needs to be as comprehensive as possible. Given this, it is also expected that the concept tags 
will help improve the overall speed and accuracy of performing searches in general. 

Proxy codes were also added to the merged data set near the end of the assessment. Testing was 
performed on the proxy codes, and limitations were identified that primarily stemmed from the 
inadequacy of the existing manual condition codes found in the existing data sets (i.e., GEE and 
PART PRACA). Significant manual coding errors were found during the process of developing 
the proxy code rules. Although the team attempted to overcome this, preliminary estimates of 
proxy code recall were about 30 percent. 

Flamenco+, an open-source search and visualization tool for multidimensional search, was 
customized to use the hierarchical indexes provided by STAT and the Aerospace Ontology for 
the data sets. Flamenco-i- was also adapted for evaluating codes. Corresponding STAT 
adaptations were made to provide output to support use of Flamenco-i- for evaluation of proxy 
codes. Due to the late incorporation of proxy tags in the merged data set. Flamenco was not 
utilized for search during this assessment but is available for use going forward. Flamenco 
should be optimized for search using concept tags. Integrated use of Flamenco-i- and Tableau® 
was not explored but is feasible and promising. 

SAS® was used to perform statistical text mining on the merged data sets, focusing on specific 
subsystems and/or classes of anomalies. This was partially successful. This approach proved 
useful for identifying potential areas of interest by grouping similar anomaly topics. However, 
since the methodology used in SAS® is statistically based on word frequency, many of the 
clusters of anomalies identified turned out to be uninteresting. Consequently, wading through 
identified clusters is time consuming and often uninformative. 

7.3 Topic of Interest 

7.3.1 Relating System Hazards and Causes with Problem or Anomaly Occurrences 

NASA S&MA organizations desire the ability to associate a potential issue or hazard to the 
historical operational anomalies or failures that could have led to the realization of the hazard. 

A system or operational hazard is defined as a risk condition that arises during operation(s) that 
can potentially lead to a loss of assets, mission, or personnel. Associating operational anomalies 
with those risk conditions can aid in understanding how those risks develop during operations 
and lead to better ways to prevent their development. 

In the vast majority of documented cases, the occurrence of an anomaly or failure does not 
ultimately lead to a catastrophic consequence described by a hazard. However, it is logical to 
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conclude that the occurrence of anomalies or failures during system operation should be related 
to the likelihood of occurrence of accidents or mishaps that result in the realization of a hazard 
(i.e., loss of assets or personnel). That is, documented occurrences of anomaly incidents such as 
those contained in the merged data set described in this study may be used to identify “close 
calls” or “precursors” to future catastrophic events. 

This section describes a methodology to use the ISS anomaly data sets and search capabilities 
described in this study to identify and cluster for further analysis the anomalies associated with 
individual ISS hazards. 

7.3.1. 1 Use Case Objective 

This objective is to provide a way to relate anomaly record information from the study’s ISS 
merged anomaly data set to potential risks and hazards defined in the ISS Hazard Analysis 
System. 

A hazard defines a potential risk/mishap that can occur during operation(s). Within NASA, 
system hazards are described through the use of hazard analyses and reports, with underlying 
standardized hazard description wording. Given a hazard of concern or interest, it is desired to 
develop a compilation of the historical anomalies that have occurred that could have led to the 
occurrence of the hazard, and potentially rank the hazard risk by the number of related anomaly 
counts. (Related anomalies can be regarded as “precursors” to individual hazard occurrences.) 

7.3.1.2 Method 

ISS hazard information is currently available in the NASA ISS Hazard Data System, and user 
access to that system will be necessary to obtain the hazard information. The system allows 
access to ISS hazard reports in portable document format, so detailed information about 
individual hazards must be manually obtained by reading the reports. Figure 7.3. 1-1 shows the 
search page image associated with the ISS Hazard Data System that can be used to retrieve 
specific hazard analysis reports. The user may search for hazard reports on subsystems/ 
payloads, hardware categories, or several other fields. For instance, if a user is interested in the 
“Hazard” record type associated with the “ECLSS (Environmental Control and Life Support 
Subsystem)” payload for the “Assembly Complete (AC)” ISS flight applicability, the user would 
select the options as shown in Eigure 7. 3. 1-1. The NASA Hazard Data System will then retrieve 
the relevant hazard report files. Once a desired hazard report is retrieved, the report will need to 
be read to extract the relevant information to search for related anomalies. 
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Figure 7.3. 1-1. NASA ISS Hazard Data System Search Page 


The primary information needed from the hazard reports is a description of the hazard causes 
and, perhaps, the associated controls. In many cases, a hazard cause, as stated, is analogous to a 
failure mode of an item or component that can lead to the realization of the hazard. In other 
cases, the hazard control section will identify the items or components whose failure jeopardizes 
the prevention of the hazardous event. This combination of a cause/failure mode with the 
associated component can then be used to map into the integrated anomaly database search 
capability. 


From the hazard cause statements and/or the hazard cause control statements, a specific system 
component or item should be described, whose anomalous behavior can be attributed as 
potentially causing the hazard to occur. Relevant statements will have a syntactic form or phrase 
such as; a “failure (of some type or mode)” of a “component or item” during some operation can 
lead to the occurrence of the hazard. 
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The use case objective here is to use these identified component and failure characteristics to 
find a related set of recorded anomalies from the integrated ISS database using the search and 
retrieval tools developed during this study. 

Example 

In this example, the hazard of interest or concern is the ISS hazard report with the title “IVA 
Crewmember Exposure to Inadequate Respirable Atmosphere,”^ with the associated hazard 
condition description of “Failure to maintain atmosphere partial pressure of oxygen and nitrogen 
within proper limits resulting in personnel injury/death.” 

The report identifies three associated causes for the hazard: 

• Cause 1. Low partial pressure of oxygen due to crew metabolic usage. 

• Cause 2. Leakage/rupture of nitrogen distribution/transfer system. 

• Cause 3. Inadvertent/excessive nitrogen introduction or release through the nitrogen 
pressure relief valve. 

Note that causes 2 and 3 already have the structure of “failure mode” of some “component or 
item.” However, further descriptions of components and failure modes are found in the Controls 
section of the hazard report. The following cause control descriptions are excerpted from the 
hazard report: 

• Cause 1 Controls: Intermodule ventilation will be established at the beginning of each 
ingress activity, by fans, and ducting between the Service Module (SM), Functional 
Cargo Block, pressurized mating adapters. Node 1, United States (U.S.) Laboratory, the 
airlock, and the orbiter. Control of oxygen levels will be performed by either the orbiter, 
while open to the station, or the SM. After orbiter departure, the airlock and the U.S. Lab 
will provide control of oxygen levels, introducing oxygen by use of a high-pressure 
oxygen tank external to the airlock and a pressure control assembly (PCA), which 
introduces oxygen into their volume via an oxygen introduction valve (OIV). The Inter- 
module Ventilation disburses oxygen throughout the ISS. 

• Cause 2 Controls: The United States On-orbit Segment Nitrogen Distribution System is 
composed of three subsystems: Supply, Recharge, and Low Pressure Distribution. The 
nitrogen system components (i.e., recharge and distribution) are designed with either 
metal-to-metal or dual 0-ring seals at joint interfaces (i.e., quick disconnects or gamah 
fittings). A single elastomer seal exists in the PCA nitrogen introduction valve (NIV). 

• Cause 3 Controls: PCA NIVs, located in the U.S. Laboratory, and the airlock are 
initialized closed and normally remain in the closed position. Each PCA can be 
configured to automatically introduce nitrogen based on the total cabin pressure 
measured by the cabin pressure sensor. In the automatic mode, the NIV valves will be 
commanded open if the total cabin pressure falls below a threshold. The NIVs will 


' The hazard number for this example is ISS-ECL-0206-AC. 
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remain open until the threshold is reached. Also, each NIV can be manually opened or 
closed by the crew, or remotely commanded by the crew/ground. NIVs remain in the last 
commanded position. The PCA is a “must work” function. 

Cursory review of the wording in the Cause or Controls sections can identify several potential 
components/items that are important to the system operations and that contribute to inhibiting the 
hazard occurrence. Selected entities are: 

• PCA 

• OIV 

• NIV 

• Nitrogen distribution/transfer system 

• Nitrogen pressure relief valve 

• Cabin pressure sensor 

• High-pressure oxygen tanks 

In addition, various failure modes identified include: 

• Low [partial] pressure 

• Leakage/rupture 

• Inadvertent/excessive [gas] introduction or release 

The analyst or engineer involved in this process should also have the system knowledge to elicit 
or infer other component/items and failure modes/causes for review purposes. 

Using the Tableau® Search Capability 

Searches are performed using the Tableau® capability to access the integrated ISS data set, based 
on the context described above. For example. Figure 7.3. 1-2 shows the main Tableau® search 
screen that results with the terms pea, oiv, and niv used as search parameters. 
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Figure 7.3.1-2. Tableatf Search Screen with Three Search Parameters 

In this case, the search retrieved 165 records that contained one of the three search parameters, 
40 of which were from the PART PRACA data set, 26 from the IFI data set, 24 from the MOD 
AR data set, and 75 from the GFE PRACA data set. A selected portion of the anomalies with 
titles and descriptions that are related to the three components pointed to by the hazard report is 
shown in Figure 7.3. 1-3. 
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Low Pressure 02 Regulator Internal Leakage 

On about GMT 200 04:00 after Oxygen tank installation once the Low and High Pressure Supply 
Valves were opened, it was noted t^at the oxygen pressure from the Lab and Airlock PCA 
oxygen pressure sensors was steadily increasing. This was analyzed to be an internal leak 
through the oxygen regulator assembly (in excess of 0.025 scc/s or about 25 times above 
specification) of about ilSkPaiday (16.4 psi/day). A/L OlV was oper>ed causing the regulator to .. 

On about GMT 2001/200 04:00 after Oxygen tank installation once the Low and High Pressure 
Supply Valves were opened, it was no^ that the oxygen pressure from the Lab and Airlock PCA 
oxygen pressure sensors was steadily increasing. This was analyzed to be an internal leak 
through the oxygen regulator assembly (in excess of 0.025 scc/s or about 25 times above 
specification) about 113kPa/day (16.4 psi/day). A/L OlV was oper>ed causing the regulator to .. 

Oxygen System Supply Line Depressurization 

During troubleshootir>g of the Low Pressure Regulator Leak on GMT 228 an unexplainable 
pressure signature caused a halt in trobleshoobng until the pressure signature is understood. 
The troubleshootir>g proceeded as follows:1) Supply valve closed (GMT 228:00:13)2) OIVs in Lab 
and Airlock opened to reduce line pressure to ambient (GMT 228:00:15:00)3) Supply valve 
opened to “slam" regulator (GMT 228:00:20)4) System allowed to flow for 10 minutes 5) OIVs d.. 

PCA Cabin Pressure Sensors Otit of Spec 

The Lab and Airlock PCAs total pressure reading exceed 0.02 psi (0.01 accuracy for each PCA. 
for a worst case difference of 0.02 psi) when the Station is equalized between the tvro. At Airlock 
activation the difference was about 0.04 psi.The probably cause of this is helium contamination. 
Back in early 2001. the Airlock was leak tested. This testing included the addition of a certain 
amount of helium into the cabin atmosphere of the Airlock. Nominally the PCP (Pressure Contr.. 

PCA Change from Monitor to Safe Mode 

For moving the Soyuz vehicle from the SM aft dockir>g position to the FOB TK port the Station 
was configured as follows:Both Lab hatches were closed, and the Lab aft port and stbd IMV 
valves were open.AII (6) Node 1 hatches were closed, and the aft port arxi forward port and stbd 
IMV valves were open .The FGB hatches were closed.The SM hatches, except the PKO/RO hatch, 
were closed.The Soyuz undocked from the SM aft position and at GMT 2001/055:10:37 docked a.. 

Unexpected PCA Inlet Pressure Alarms 

Durir>g the inital purges N2 system, a PCA Low N2 Pressure high alarm at 127 psia was set. The 
alarm is set to 120 psia and the maximum spec lockup pressure is 145 psia (with a 14.7 cabin) of 
the pressure regulators. When we are not flowing then the pressure could rise above alarm with 
in spec regulator performance.This also applies to the 02 system ai>d at all PCAs. 


Figure 7.3. 1-3. Anomaly Text Information Resuits for Associated Hazard Components/Items 

The associated failure mode descriptions are also provided by the Tableau® search and are shown 
in Figure 7.3. 1-4. The failure modes of interest from the hazard report deal with low pressure, 
leakage/rupture, or inadvertent release. From the information in Figure 7.3. 1-4, several of the 
24 records with descriptions such as EXTERNAL/INTERNAL EEAKAGE, STRUCTURE 
EAIEURE, or PREMATURE OUTPUT map to these kinds of failure modes, and the user may 
select these particular records for more detailed examination to determine how closely the 
anomalies relate to the hazard conditions presented in the hazard report. 

Failure Mode Description 


Fails Open or Fails to Close (Retract) Completely 

2 

Out of Tolerance (Function) 

1 

Not Applicable 

1 

Temp / Pressure High 

1 

Other 

1 

Improper Installation 

2 

STRUCTURE FAILURE 

1 

EXTERNAL LEAKAGE 

2 

Incorrect Part 

1 

UNSATISFACTORY CONDITION 

1 

Hardware Not Per Drawing 

2 

INTERNAL LEAKAGE 

1 

Hardware Not Per Drawing 

1 

Intermittent Operation 

2 

FAILS ON 

1 

Premature Inadvertent Output (Operation) or Shutdown 

1 


Figure 7. 3.1-4. Failure Mode Descriptions Associated with Identified Anomaiy Records 
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As an additional source of information, the associated part information resulting from the 
Tableau® search is shown in Figure 7.3. 1-5. This part information helps to corroborate that the 
components/items associated with the anomalies are those of interest (i.e., those found in the 
hazard report). 


Part Number 

Part Description 

Database Name 


683-16421-6 

LOW PR 02 REG/RLF VLV ASSY 

PART PRACA 

1 

2353052-1-1 

Oxygen/Nttrogen Isolation Valve 

PART PRACA 



OXYGEN/NITROGEN ISOLATIONVALVE 

PART PRACA 

1 


Figure 7. 3. 1-5. Part Numbers and Descriptions Associated with Identified Anomaiy Records 
Observations 

The methodology discussed above provides a tool for S&MA personnel, as well as other 
interested organizations, to identify incidents that have occurred in the past that could have led to 
a critical or catastrophic mishap or event. These incidents may be reviewed, counted, and 
trended to raise awareness and assess whether preventive actions would be prudent. The ability 
to search across several databases to identify the appropriate incidents is a key attribute to 
finding a more complete set of incidents for analysis. 

7.4 Description of Future Analysis Plans 

The original plan called for the NESC assessment team to identify ISS trends and/or significant 
anomalies. This work was not completed, due largely to the cleanup, merging, and data-mining 
efforts being more challenging and time consuming than expected. The assessment lead will 
work through the Systems Engineering TDT and the NESC Review Board to develop a plan 
going forward. 

8.0 Findings, Observations, and NESC Recommendations 

8.1 Findings 

The following findings were identified: 

F-1. The expected goals and outcomes of the data-mining effort determine which data sets and 
fields are required. Eor example, importing problem descriptions was essential for 
performing problem trends. However, disposition and corrective action text fields, where 
available, would likely have been helpful but were not carried over to the merged data set 
and, therefore, were not available for trend analysis. 

F-2. On occasion, searching the merged data set can result in over-counting frequencies and 
trends. PART lEIs are often elevated and repeated in PART PRACAs. On occasion, AR 
records with reoccurrences of a problem in a single data record can result in under- 
counting frequencies and trends. 
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F-3. Proxy code development efforts that were based on manual trend eodes proved not to be 
effeetive beeause too many errors were found in the manual trend eodes. 

F-4. Visualization tools ean be sueeessfully eustomized for querying and filtering merged data 
set and displaying query results in multiple displays for the user. 

• Demonstrated with Tableau® and Flameneo. 

F-5. The tool suite developed in this assessment showed promise in supporting diseipline 
experts in performing deep investigations into teehnieal issues. 

8.2 Observations 

0-1, The data analysis team demonstrated the ability to ereate a searehable, merged problem 
data set from multiple problem reporting systems by overeoming problems/limitations 
between fields and dissimilar field values for individual data sets. 

0-2, Coneept tags based on modifications to the Aerospaee Ontology were ereated for all of 
the records in the merged data set. These tags were integrated into the merged data set 
late in the assessment and were not fully evaluated. 

0-3, Within the Tableau® Desktop framework, the merged data set may have reaehed its 
performanee limits, so that expanding the number of reeords, faeeted seareh, or 
visualization eapabilities will require server-based systems. 

0-4, Standard query-type searehes are limited in that they will not eateh multiple synonyms, 
alternate spellings, abbreviations, and aeronyms. 

0-5, Complexity in user interfaees for seareh requires users to have additional training to 
maximize the benefits of information retrieval. 

0-6, The assessment was not able to fully realize strategies for information retrieval based on 
multidimensional faeeted seareh. 

0-7, STAT strategies for tagging of eomplex phrases sometimes obseure properties that are 

important search terms. Words such as “inadvertent” are merged into a generic “bad” set 
of variants applied to operations. The resulting coneept tag emphasizes the type of 
operation/funetion rather than the type of problem. 

0-8, Lexieal analysis and text filtering ean be refined so that a seeond round of data cleaning 
is avoided during review of candidate words and phrases for the Aerospace Ontology 
voeabulary. 

0-9, The SAS text-mining proeess can be redesigned for improved recall and precision by 
including concept tags. 

0-10, It was demonstrated that anomaly reeord information from the ISS merged data set ean be 
related to potential risks and hazards defined in the ISS Hazard Analysis System. 
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8.3 NESC Recommendations 

The following NESC recommendations are directed to future users or implementers of this tool 

suite, or to developers who will merge and trend across multiple data sets. These 

recommendations are intended to achieve a robust tool suite framework and data set for analyses 

of anomaly groups and trends. 

R-1. In future data mining efforts across multiple reporting systems, carefully align the 
objectives and expected outcomes of the investigation with the selected problem 
reporting systems and their reporting processes, recognizing the possibility of duplicate 
records. (F-1) 

R-2. To perform more accurate problem counts and trends across the PART and AR data sets, 
develop methods and capabilities to aid the user in merging, associating, or eliminating 
duplication to support the goals of the trending. (F-2) 

R-3. The Agency should develop a minimal set of common data fields and field values that are 
clearly defined for use in problem reporting data sets. (0-1) 

R-4. Consider using query reformulation. Variant lists can be included in the user interface so 
that if one of the words or phrases in the list is entered as a search term, others in the list 
can be offered. The user can review these and build a better query. Updates to the 
Aerospace Ontology should include additional variants for these data sets. (0-4) 

R-5. Integrate the concept tags from STAT/Aerospace Ontology into information retrieval 
strategies in the search and visualization tools and evaluate their effectiveness. (F-3, 

0-9) 

R-6. Explore strategies where faceted search uses hierarchy in the data to look ahead and filter 
and, thus, complements search and filtering in the visualization tool. (0-6) 

R-7. Improve processing of complex expressions in text and use a negative properties facet to 
provide better indexing of types of problems. (0-7) 

R-8. Develop look-ahead strategies with dimensional partitions (facets) for quick browsing, 
summaries, conceptual metadata, and accessible information on the types of data in each 
data source. These dimensions should be specified to highlight common features of 
nonconformances. (0-6) 

R-9. Investigate further, with the ISS S&MA community, the use of the merged data set and 
tool suite developed during this assessment to gain a better understanding how past ISS 
operations reflect on existing ISS hazards. (0-10) 
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9.0 Alternate Viewpoint 

There were no alternate viewpoints identified during the course of this assessment by the NESC 
team or the NRB quorum. 

10.0 other Deliverables 

In addition to this final report of findings, observations, and associated recommendations 
regarding ISS significant anomalies and/or trends, the following deliverables were provided to 
the stakeholders: 

• The current ISS anomaly data set, accessible by way of a tool suite, to include the 
graphical user interface. 

• Training and reference material documenting lessons learned and configuration of the 
tool suite to support any future trending activities beyond ISS. 

11.0 Lessons Learned 

11.1 Preventing Errors in Problem Reporting Codes 

11.1.1 Description 

Fields for manually assigning problem reporting codes were included in some of the databases in 
the ISS anomaly data set. The coding schemes for types of failure modes and defects produced 
coding errors and made search by codes less effective. The coding errors were discovered while 
designing rules for generating proxies for these codes based on content in the text fields in the 
reports. In the GFE PRACA and PART PRACA data sets, manual coding errors were much 
worse than expected. 

Multiple possible types of coding errors can occur: 

• Misinterprets code definitions (Help text) or is unable to fill in gaps in short definitions. 

• Misinterprets how to assign codes to multiple condition fields, especially when there is 
some overlap. 

• Misinterprets text description in report or cannot guess missing information in the report. 

• Chooses a nonspecific code. 

o Varying reluctance to commit to specific code, 
o Appropriate code not found in set. 

• Uses only a subset of codes to handle difficult coding schemes. 

• Copies a code from a related report (which may be incorrect). 

Many problem reporting codes are not clearly defined. The definitions (Help text) are brief and 
confusing. No guidance is given on what code assignment should be used when multiple 
alternative codes are possible. Data overload for users occurs because the code sets are large and 
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multilayered, with complex, inconsistently structured fields and codes. Types of relations 
between the fields are not explicit or well-defined. Subtype-super-type relations are mixed with 
other relations in code hierarchies, violating the assumption that all the characteristics of the 
superset are applicable for the members of the subset. 

11.1.2 Corrective and/or Preventive Actions 

Procedures for developing and reviewing coding schemes should be defined, with emphasis on 
clarity and ease of use by both coders and analysts. Codes and problem reporting fields need to 
be well-defined and distinct. Criteria for assigning each code need to be expressed in definitions 
that are long enough for clarity, with sufficient examples and detail. They should be expressed 
in terms that are aligned with the language used in the text fields of the reports. If the coder is 
constrained to select a single code and no secondary codes are allowed, then guidance is needed 
as to what characteristics should be primary or preferred in assigning the code. This information 
should also be available to analysts who use the codes to retrieve records. 

Coding schemes should be evaluated by inter-rater reliability studies before they are released. 
Reproducibility is frequently measured as inter-rater reliability between two or more coders. 
Code selections should be regularly reviewed, and coding errors should be corrected. Results of 
the reviews should be used for updating coding schemes and definitions. Systems for training 
and help should be provided, such as advice and additional information in FAQs. 

12.0 Recommendations for NASA Standards and Specifications 

No recommendations for NASA standards and specifications were identified as a result of this 
assessment. 


13.0 Definition of Terms 


Corrective Actions Changes to design processes, work instructions, workmanship practices, 
training, inspections, tests, procedures, specifications, drawings, tools, 
equipment, facilities, resources, or material that result in preventing, 
minimizing, or limiting the potential for recurrence of a problem. 

Finding A relevant factual conclusion and/or issue that is within the assessment 

scope and that the team has rigorously based on data from their 
independent analyses, tests, inspections, and/or reviews of technical 
documentation. 


Lessons Learned Knowledge, understanding, or conclusive insight gained by experience 
that may benefit other current or future NASA programs and projects. 

The experience may be positive, as in a successful test or mission, or 
negative, as in a mishap or failure. 

Observation A noteworthy fact, issue, and/or risk, which may not be directly within the 

assessment scope, but could generate a separate issue or concern if not 
addressed. Alternatively, an observation can be a positive 
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acknowledgement of a Center/Program/Project/Organization’s operational 
structure, tools, and/or support provided. 

The subject of the independent technical assessment. 

The event(s) that occurred, including any condition(s) that existed 
immediately before the undesired outcome, directly resulted in its 
occurrence and, if eliminated or modified, would have prevented the 
undesired outcome. 

A proposed measurable stakeholder action directly supported by specific 
Finding(s) and/or Observation(s) that will correct or mitigate an identified 
issue or risk. 

One of multiple factors (events, conditions, or organizational factors) that 
contributed to or created the proximate cause and subsequent undesired 
outcome and, if eliminated or modified, would have prevented the 
undesired outcome. Typically, multiple root causes contribute to an 
undesired outcome. 

Supporting Narrative A paragraph, or section, in an NESC final report that provides the detailed 
explanation of a succinctly worded finding or observation. For example, 
the logical deduction that led to a finding or observation; descriptions of 
assumptions, exceptions, clarifications, and boundary conditions. Avoid 
squeezing all of this information into a finding or observation. 

14.0 Acronym List 

AMA Analytical Mechanics Association, Inc. 

AR Anomaly Report 

DR Discrepancy Report 

EG Enterprise Guide 

EMU Extravehicular Mobility Unit 

ETE Extract, Transform, and Eoad 

EVA Extravehicular Activity 

GEE Government-furnished Equipment 

IFI Items for Investigation 

ISS International Space Station 

JSC Johnson Space Center 

KSC Kennedy Space Center 

EaRC Eangley Research Center 

MADS Maintenance Analysis Data Set 

MOD Mission Operations Directorate 

MTSO Management and Technical Support Office 

NESC NASA Engineering and Safety Center 


Problem 

Proximate Cause 

Recommendation 
Root Cause 
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NGO 

Needs, Goals, and Objectives 

NIV 

Nitrogen Introduction Valve 

NRB 

NESC Review Board 

OIV 

Oxygen Introduction Valve 

PART 

Problem Analysis Resolution Tool 

PCA 

Pressure Control Assembly 

PRACA 

Problem Reporting and Corrective Action 

QARC 

Quality Assurance Record Center 

S&MA 

Safety and Mission Assurance 

SCR 

Software Change Request 

SEO 

Systems Engineering Office 

SM 

Service Module 

STAT 

Semantic Text Analysis Tool 

TDT 

Technical Discipline Team 

U.S. 

United States 
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